Setting up Google Cloud Storage as a target
Set your Google Cloud credential JSON key, create a Google Cloud Storage bucket, and get the necessary credentials for using Google Cloud Storage with Data Integration.
Prerequisite
Ensure you’ve signed up for the Google Platform and you have a console Admin user. If you do not have one of these prerequisites, you can start here .
Create a service account user for Data Integration
Data Integration uses a Google Cloud Storage bucket to upload your source data into it. Create a user in the Google Cloud Platform console with access to the relevant bucket and the relevant BigQuery project.
Procedure
- Sign in to the Google Cloud Platform console.
- Go to IAM & Admin > Service account > and click CREATE SERVICE ACCOUNT.
- In the CREATE SERVICE ACCOUNT page:
- Set your Service Account name (
Data_Integration User) and click CREATE AND CONTINUE. - Grant the service account access to the project by setting Roles:
- Click on the drop-down list and select BigQuery Admin.
- Click the ADD ANOTHER ROLE and do the same process for Storage Admin.
- Copy your Account Service ID / Email from the service account list. You can use this to enter it in a Data Integration connection.
- Set your Service Account name (
- Create a key for the service account:
- Go to the service account screen, locate the service you created, and click on it.
- In the new service account page, click on Key.
- Click on Add key.
- Choose key type JSON and click on create.
- Your JSON secret key download is complete.
Enabling Cloud Storage and GCS API
- Go to API's & Services and click ENABLE APIS AND SERVICE.
- Search for Google Cloud Storage JSON API and click Enable API.
Creating a Google Cloud Storage bucket
Data Integration needs a Google Cloud Storage bucket as a FileZone before loading your data to BigQuery. You can use the FileZone bucket or objects as a base for other Hadoop or Apache Spark operations, such as those provided by Google Data PROC or your different services.
Procedure
- Sign in to the Google Cloud Platform console.
- Go to Storage > Browse, and click CREATE BUCKET.
- In the CREATE BUCKET page:
- Set Bucket Name, for example:
project_name_data_integration_file_zone. - Set your Bucket to Regional (Multi-Region is not stable for loading) and choose your preferred location.
- Click CREATE.
- Set Bucket Name, for example:
Configuring Google Cloud Storage bucket in Data Integration
Create a new connection for your Google Cloud Storage. Enter your credentials information for Google Platform Service Account.
- Connection Name
- Project ID (is available on the Google Platform Home section)
- Project Number (optional - is available on Google Platform Home section)
- Service Account Email - you need the Service Account ID to copy the Service Account page.
- Region: indicates the bucket’s creation location.
- Set your custom File Zone to save the data in your own staging area (Optional).
- Click Test Connection. After validating it, save the connection.
Known issues
Sometimes the Storage Admin type user role does not have a certain permission storage.buckets.get given to it by default:
In this case, you must edit your GCP user roles by duplicating that Storage Admin role. Ensure the custom role you create has the storage.buckets.get permission, then assign your service account this custom role instead of the Storage Admin (Refer to the Create a Service Account User for Data Integration section).